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REMARKS 



Applicants have carefully reviewed this application in light of the Final Office Action 
mailed August 25, 2004, and the Advisory Action mailed December 1, 2004. Apphcants 
appreciate the Examiner's consideration of the Application and respectfully request favorable 
action in this case 



The Examiner rejected Claims 2-7, 39, 9-14, 40, 32-34, 37-38, and 40 under 35 
U.S.C. §102(e) as being anticipated by U.S. Patent No. 6,463,414 

The entire disclosure of Su is not prior art under § 102(e) because it was filed on April 
12, 2000, which is almost four months after Applicants' filing date of December 15, 1999. 
While Su claims priority to a provisional application filed on April 12, 1999, that provisional 
application does not support the entire disclosure of Su, Su may qualify as prior art only to 
the extent the disclosure is supported by Provision Application No. 09/547,832 (^'Su 
Provisional Application''), which is attached for the Examiner's consideration. 

Independent Claim 38 and Dependent Claims 2-7 



Independent Claim 38 recites: 

An apparatus for using a plurality of processors to 
support a media conference, comprising: 

a mixing processor operable to mix input media 
information associated with two or more first participants to 
generate output media information for communication to a 
second participant; and 

a first media transformation processor coupled to the 
mixing processor, the first media transformation processor 
operable to receive the output media information from the 
mixing processor, to encode the output media information to 
generate an output data stream, and to communicate the output 
data stream to the second participant's end-user device. 



Claim Rejections — 35 U.S.C. §102 
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Su does not disclose, teach, or suggest Applicants' claimed invention because, as the 
Examiner has acknowledged, Su does not disclose separate processors as recited in the 
claims. 

Independent Claims 38 requires multiple processors: a "mixing processor" and a 
"first media transformation processor." The specification provides: 



Media transformation processors 12 and mixing processors 14 
represent separate hardware components. The functionality 
described below may be implemented using separate hardware 
components or software that executes using the separate 
hardware components. Thus, media transformation processor 
12 and mixing processor 14 do not operate using the same 
actual physical computing machinery. Media transformation 
processors 12 and mixing processors 14 may represent separate 
microprocessors, controllers, digital signal processors (DSPs), 
or other integrated circuit chips mounted to a circuit board. 
Alternatively, media transformation processors 12 and mixing 
processors 14 may represent separate networks of electronic 
components, such as transistors, diodes, resistors, etc., and their 
interconnections etched or imprinted on a single chip. In such 
an embodiment, media transformation processors 12 and 
mixing processors 14 may use shared resources but generally 
rely on separate pipelines to perform the majority of their 
processing. Although media transformation processors 12 and 
mixing processors 14 represent separate hardware components, 
the hardware components are not necessarily different in type. 
In a particular embodiment, media transformation processors 
12 and mixing processors 14 are implemented using the same 
type of digital signal processors. 



(p. 9). Thus, a "mixing processor" and a "first media transformation processor" represent 
separate hardware components. 

The Examiner has several times acknowledged that Su does not disclose separate 
processors. In the Office Action mailed February 25, 2004, the Examiner stated: 



Su did no specifically disclose said processors being separate as 
in claim 35, being DSP as in claim 36. 



(p. 5). The Examiner repeated this statement in the Office Action mailed August 25, 2004. 



(p. 5). 
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The Examiner was correct in his observation. Su expressly states that the invention is 
described in terms of "functional block components" which may be implemented using "any 
number of hardware components or software elements": 



The present invention may be described herein in terms of 
functional block components and various processing steps. It 
should be appreciated that such functional blocks may be 
realized by any number of hardware components or software 
elements configured to perform the specified functions. For 
example, the present invention may employ various integrated 
circuit components, e.g., memory elements, digital signal 
processing elements, logic elements, look-up tables, and the 
like, which may carry out a variety of functions under the 
control of one or more microprocessors or other control 
devices. 



(Col. 2, 11. 49-59) (emphasis added). Su does not specify that the functions of decoders 230 
and 234, mixer 238 and 240, and encoder 232 and 236 are assigned to separate processors. 
Indeed, Su provides that the functional blocks may be implemented in software. 

Another passage in Su further indicates that Figure 2, on which the Examiner relies to 
support the rejections, is a "simplified schematic" of functional blocks as opposed to 
hardware components. 



FIG. 2 is a simplified schematic: there might also be certain 
additional components advantageously coupled between the 
packet network and the decoders (and encoders). Specifically, 
with respect to the decoders, there, will likely be a functional 
block (not shown) that receives the packets fi-om packet 
network 201 and removes all unnecessary routing, encryption, 
and protection information (a "decapsulator"). Conversely, with 
respect to the encoders, there will likely be a functional block 
(an "encapsulator") for each encoder that receives speech 
samples firom the mixer and adds certain information regarding 
routing, encryption, and the like prior to sending the packets 
out over packet network 201. 



(Col. 5, 11. 19-31) (emphasis added). 

Furthermore, Figure 2 of Su is not prior art because it is not included in the Su 
Provisional Application, Figure 2 of the attached Su Provisional Application (which may or 
may not be prior art) even more clearly portrays the decoding, mixing and re-encoding as 
functional blocks as opposed to separate hardware components. 
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For at least these reasons, Su does not disclose, teach, or suggest the "mixing 
processor" and "first media transformation processor" of Claim 38. Accordingly, Applicants 
respectfully request reconsideration and allowance of independent Claims 38, as well as 
Claims 2-7 which depend from Claim 38. 

Independent Claim 39 and Dependent Claims 9-14 

Independent Claim 39 recites: 



A method for using a plurality of processors to support 
a media conference, comprising: 

mixing input media information associated with two or 
more first participants to generate output media information for 
communication to a second participant; 

communicating the output media information from a 
mixing processor to a first media transformation processor; 

encoding the output media information to generate an 
output data stream; and 

communicating the output data stream from the first 
media transformation processor to the second participant's end- 
user device. 



Su does not disclose, teach, or suggest Applicants' claimed invention because, as the 
Examiner acknowledged in the Office Action, Su does not disclose separate processors as 
recited in the claims. Like Claims 38, independent Claim 39 requires multiple processors. 
Claim 39 recites the steps "mixing input media information associated with two or more first 
participants to generate output media information for communication to a second 
participant," "communicating the output media information from a mixing processor to a first 
media transformation processor ." and "encoding the output media information to generate an 
output data stream." As pointed out above with respect to Claim 38, these separate 
processors represent separate hardware components. Because Su and the Su Provisional 
Application disclose functional blocks as opposed to separate processors, Su does not 
disclose, teach, or suggest the "mixing processor" and "first media transformation processor" 
of Claim 39. Accordingly, Applicants respectfully request reconsideration and allowance of 
independent Claims 39, as well as Claims 9-14 which depend from Claim 39. 
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Independent Claim 40 and Dependent Claims 32-34 and 37 

Independent Claim 40 recites: 

A system for using a plurality of processors to support a 
media conference, comprising: 

a plurality of end-user devices coupled to a data 
network and operable to generate input media information, to 
encode the input media information to generate input data 
streams, and to communicate the input data streams using the 
data network; and 

a conferencing device coupled to the data network, the 
conferencing device comprising two or more processors 
operable to decode the input data streams to generate the input 
media information, to mix the input media information to 
generate output media information, and to encode the output 
media information to generate output data streams; 

wherein the end-user devices are further operable to 
receive the output data streams and to decode the output data 
streams to generate output media information 

Su does not disclose, teach, or suggest Applicants' claimed invention because, as the 

Examiner acknowledged in the Office Action, Su does not disclose separate processors as 

recited in the claims. Like Claims 38 and 39, independent Claim 40 requires multiple 

processors. Claim 40 recites, "the conferencing device comprising two or more processors 

operable to decode the input data streams to generate the input media information, to mix the 

input media information to generate output media information, and to encode the output 

media information to generate output data streams." As pointed out above with respect to 

Claim 38, these separate processors represent separate hardware components. Because Su 

and the Su Provisional Application disclose functional blocks as opposed to separate 

processors, Su does not specify that the functions of decoders 230 and 234, mixer 238 and 

240, and encoder 232 and 236 are assigned to separate processors. For at least this reason, Su 

does not disclose, teach, or suggest "the conferencing device comprising two or more 

processors operable to decode the input data streams to generate the input media information, 

to mix the input media information to generate output media information, and to encode the 

output media information to generate output data streams," as recited in Claim 40. 
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Accordingly, Applicants respectfully request reconsideration and allowance of independent 
Claims 40, as well as Claims 32-34 and 37 which depend from Claim 40. 

Claim Rejections -- 35 U.S.C. §103 

The Examiner rejected Claims 5, 6, and 35-36 under 35 U.S.C. § 103 as being 
unpatentable over Su in view of U.S. Patent 5,841,763 C^eondires''), 

According to the Examiner, Leondires "discloses a conferencing device with separate 
processors." (p. 5). Leondires, however, does not disclose, teach, or suggest using separate 
processors for mixing and encoding. The portion of the specification cited by the Examiner 
describes audio encoding digital signal processors (ADPs) and audio encoding digital signal 
processors (AEPs). The ADPs decode audio information. (Col. 14, 11. 33-43). The AEPs 
mix and encode audio information: "The AEPs read the decoded audio signals from DSs 
time slots, mix the decoded audio signals from each of the conferees and encode the results of 
the mixing according to the particular G-series standard." (Col. 14, 11. 51-54). 

In contrast to the AEPs of Leondires, Claims 38 and 39 require two separate 
processors for mixing and encoding. Claim 39 requires: (1) "a mixing processor operable to 
mix input media information" and (2) "first media transformation processor operable to 
receive the output media information from the mixing processor, to encode the output media 
information to generate an output data stream, and to communicate the output data stream to 
the second participant's end-user device." Similarly, Claim 39 distinguishes between a 
mixing processor for mixing and a media transformation processor for encoding. Claim 39 
requires the following steps: "mixing input media information associated with two or more 
first participants to generate output media information for communication to a second 
participant," "communicating the output media information from a mixing processor to a first 
media transformation processor," and "encoding the output media information to generate an 
output data stream." 

For the reasons discussed above with respect to independent Claims 38, 39, and 40, as 
well as these additional reasons, Su and Leondires do not disclose Applicants' claimed 
invention recited in dependent Claims 5, 6, and 35-56. Accordingly, Applicants respectfully 
request reconsideration and allowance of dependent Claims 5, 6, and 35-36. 
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CONCLUSION 



Applicants have made an earnest attempt to place this case in condition for allowance. 
For the foregoing reasons, and for other reasons clearly apparent, Applicants respectfully 
request full allowance of pending Claims 2-7, 9-14, and 32-40. If the Examiner feels that a 
telephone conference or an interview would advance prosecution of this Application in any 
manner, the undersigned attorney for Applicants stands ready to conduct such a conference at 
the convenience of the Examiner. 

Applicants enclose a check for $790.00 to cover the filing of this Request for 
Continued Examination (RCE). Apphcants also enclose a check for $120.00 to cover the cost 
of filing a one-month extension of time. The Commissioner is hereby authorized to charge 
any other fees or credit any overpayments to Deposit Account No. 02-0384 of Baker Botts 



L.L.P. 



Respectfully submitted, 



BAKER BOTTS L.L.P. 



Attomeys for Applicants 



JefferyD. Baxter 
Reg. No. 45,560 
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FIELD OF THE INVENTION 
The present invention relates generally to telecommunication systems. In particular, the 
present invention relates to the processing of speech signals. More particularly, the present 
invention relates to the processing of speech signals in the context of conference call bridging. 



A more complete understanding of the present invention may be derived by referring to 
the detailed description when considered in connection with the following Figures: 

FIG. 1 is a schematic representation of a conference bridging system in accordance with 
the present invention; 

FIG. 2 is a schematic representation of a conference bridging element in accordance with 
the present invention; 

FIG. 3 is a schematic representation of an exemplary configuration that may be utilized in 
a practical application; and 

FIG. 4 is a flow diagram of an exemplary intelligent bridging process in accordance with 
the present invention. 



DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS OF THE INVENTION 



The follownng description of the preferred exemplary embodiments are not meant to limit 
the scope of the present invention in any way. Those skilled in the art will recognize that 
changes and modifications may be made to the preferred embodiments without departing from 
the scope of the present invention. These and other changes or modifications are intended to be 
included within the scope of the present invention, as broadly described herein. 

647998 01/50944 4600 



BRIEF DESCRIPTION OF THE DRAWINGS 




Intelligent Mixing of Speech Channels for a Conference Bridge 
Description of the problem 

A conference bridge enables a conference call, as multiple input speech channels are mixed together and 
then fed into multiple output speech channels. The input speech channels are digital, and carry the speech 
information in a coded digital form. The digital fonn for each channel is a bit stream, generated by a 
speech encoder at the remote end of the conference call. Each bit stream can be generated by a different 
speech coding standard, for example, G.711, G.726, G.729(A), or G.723.1 (at two possible bit rates). One 
possible approach for the mixing of the several speech channels is the decoding of the speech (each 
channel with its appropriate decoder), summation of the speech signal into a single channel, and re- 
encoding of the mixed channel (with the appropriate encoders) to generate the bit streams for the multiple 
output channels. 

Several problems are encountered with this direct mixing approach. The problems arise for the case of one 
active talker at a given time, as well as for the case of multiple active talkers at a given time. First, even for 
a signal active talker, it is clear that in this approach each speech signal is coded twice, first to generate the 
bit stream into the conference bridge, and then to generate the bit stream out of the conference bridge. It is 
well known that this tandem coding result in a degradation of the speech. Another problem arises when 
several talkers try to talk at the same time. Since low-bit rate speech coders are highly tuned for a single 
talker (by using, for example, a limited order spectral model and a single pitch representation), they are 
unsuitable for the coding of a signal that is comprised of several talkers at the same time. Another issue is 
the computation complexity in the conference bridge. While several speech parameters, such as spectrum, 
pitch, energy, level of background noise, are known for each individual decoder, they have to be re- 
computed by the encoder of the mixed signal 



We propose the approach of intelligent conference bridge operation. Intelhgent bridge comprises of 4 basic 
steps. At the first step all input speech channels are aligned and a common framing is established, and 
parameters extraction is performed for channels that use non-parametric coders. The second step involves 
an intelligent speech mixing of the speech waveform of the input channels, the third step is an inteUigent 
mixing of the parameters of the input speech channels, and the fourth step is an intelligent re-encoding of 
the mixed output speech channels. These steps can incorporate priority assignment and speech 
enhancement (for example, by noise reduction or reshaping) for each input and ou^ut chaimel. 
This second step and third steps require the modificaaon of die standard speech decoders for their special 
operation in the conference bridge, and the third and fourth step require the modification of the standard 
speech encoders for their special operation in the conference bridge. 



Our solution 




Framing and Alignment for Speech Mixing in a Conference Bridge 



Description of the problem 

Several coded speech channels are the input into the conference bridge. The speech at each channel is 
represented by a bit stream of a speech coding scheme. Not only the fonnat of the bit stream is different 
from one coding scheme to the other, but also the frame size, for example, from 30 ms in G.723.I to 20 ms 
in the futuristic G.4k, to 10 ms in G.729, and to 5 ms for G.728. Moreover, the input bit stream can be 
coming from a frame-less speech coding approach, such as G.726 or G.7 1 1 . Intelligent mixing of the 
speech requires a common frame for the mixing of the parameters. 

Our solution 

We propose, as a first step in intelligent operation of a conference bridge, the creation of a *super frame', 
which is the largest size frame of all of the coding schemes of the speech input channels. For example, if at 
least one input channel uses the G.723. 1 coder, the size of the super frame will be 30 ms. We propose the 
alignment and the buffering of the short length frames to create a super frame (for example, three 10 ms 
frames of G,729 to generate a 30 ms super frame suitable for intelligent nuxing widi G.723. 1). We propose 
the interpolation of the speech parameters from the ahgned short length frames to the long length frames, 
and from the long length frame to the aligned short length frames. We propose creating an aligned super 

.^^ frame structure for the frame-less coding schemes (such as G.71 1, G.726). We propose parameter 

extraction and interpolation approach for the non-parametric coders (such as G.71 1, G.726, and G.728), 

1'=^ and the use of these parameters in the intelligent mixing of these coders with other coders. 



'Returned-Echo^ Cancellation Using Multiple Intelligent Mixing in a Conl'erence Bridge 



Description of the problem 

A conference call involves several participants. For each participant, the mixed speech information from all 
the other participants should be provided One possible solution is tiie (intelligent) mixing of all the 
channels into a single channel, which is used as the input for each of the output encoders in the conference 
bridge. The main problem with this approach is that each participant will hear his or her speech, in addition 
to the speech generated by the other participants. Hearing the speech of oneself, delayed by the two-way 
digital Unk and the conference bridge processing time, is perceived as a very armoying returned echo. For 
an IP based conference bridge, the delay can be of the order of several hundred ms, and the returned echo 
would be intolerable. 



We propose the intelligent *retumed-echo* cancellation in a conference bridge. We propose to generate a 
multiple of mixed signals at the conference bridge, each mixed composed of all the input speech channels, 
excluding the speech of one channel. The mixed signal without the contribution of a particular participant 
is used as tiie output speech chaimel for tiiat particular participant This mbcing scheme removes the 
contribution of each participant from the signal that is sent back to him/her by the conference bridge, and 
removes completely the returned echo effect. 



Our solution 




I u 




Intelligent Spectral Mixing in a Conference Bridge 



Description of the yroblem 

The speech spectrum is an important parameter for parametric speech coding. The speech spectrum is 
commonly represented by the linear prediction (LP) parameters, or by one of their alternative 
representation, such as normalized autoconelation function, the reflection coefficients, the arc-sin 
parameters, the log-area ratios, the line spectral frequencies, the cosines of the line spectral frequencies, as 
well as the impulse response of the LP filter. Any parametric coder, such as G.723.1 and G.729, transmits a 
coded representation of the spectrum. It is well known that an accurate representation of the spectrum is 
cmcial for high quality speech, and tiaat the reevaluation of the spectrum is a major source of degradation 
in tandem coding of speech. 

Our solution 



We propose to intelligent spectral mixing for the conference bridge. Tlie intelligent spectral mixing uses 
the decoded spectral information from the mulitple input channels, instead of reevaluating the spectrum of 
the mixed signal. The spectra can be mixed to provide a meanmgful spectral information to the output 
speech encoder. The spectral mixing can take into consideration the alignment, the framing, the content of 
each speech input (for example, its energy), as well as timing information, such as a the information about 
_ a 'cutting in* talker. The spectral mixing can also be preset to favor specific talker or talkers, providing 

jj; them a better control over the conference call. The spectial mixmg can be performed usuig any of the 

y representation for the spectrum, described above. In particular, we suggest spectral mbcing using the hne 

1*=^ spectral frequencies {or the cosines of the line spectral frequencies), which are readily available in most 

Ty parametric coders, in order to reduce the complexity of the conference bridge. We also suggest to obtain a 

IB spectral estimate for non-parametric coders, such as G 71 K G.726. and G.728, and to use this spectral 

l;S estimate for the inteUigent mixing with parametric coders, such as G.729 and G.723. 1 . 

i\ 

n 



Intelligent Pitch Mixing in a Conference Bridge 



111 



Description of the problem 

The pitch is an important parameter in parametric coding of speech. Reevaluating of the pitch from the 
mixed signal is a simple approach of pitch determination for the mixed output signal in a conference 
bridge. However, when several participants talk at the same time, the evaluated pitch value might not be 
meaningful and the mixed signal will be distorted. Moreover, the reevaluation of the pitch will require 
additional computation for the output channel encoders. 

Our solution 



We propose an intelligent pitch mixing for the conference bridge. Since each parametric coder, such as 
G.729 and G.723. 1, transmits a description of the pitch we propose to use this pitch information to select a 
single pitch for the output channel encoders at each time. We propose the mixing of the pitch based on the 
input channels speech and timing information. We propose this pitch mixing as either a final pitch to be 
used by the channel output encoders, or as an initial pitch estimate for the channel output encoders. In 
particular, we propose the selection of a single pitch, based on the energy of the input speech channels and 
the pitch prediction gain, to be used as an initial estimate for the closed-loop pitch selection, common in 
low bit-rate coders such as G. 723.1 and G-729. We also suggest to obtain a pitch estimate for non- 
parametric coders, such as G.711, G.726, and 0.728, and to use this pitch estimate for the intelligent 
mixing with parametric coders, such as G.729 and G.723. 1. 



Priority Assignment in a Conference Bridge 



Description of the problem 

In a common conference call, the speech signals of all participants are mixed without any priority or 
preference of one or more participants over the others. Intelligent mixing of speech enables the assignment 
of a higher or lower priority to one or more participants, which can serve as a tool for managing and 
controlling the conference call 



We propose to use a priority assigning algorithm in intelligent mixing of speech for a conference bridge. A 
higher or lower priority of a talker can be in^lemented by a higher or lower weight in mixing his/her 
speech parameters during parameters mixing, or by a higher or lower level of mixing of the talker speech 
waveform during the waveform mixing. 



Our solution 



m 






Background Noise Handling for a Conference Bridge 



Description of the problem 

Background noise poses a special problem for low bit-rate speech coders, which are incapable of producing 
a perceptually faithful representation of most types of background noise. As more and more phone calls ai e 
placed from mobile phones, this problem becomes more acute in modem telephony systems. It is well 
known that the representation of the background noise is worse in tandem coding of speech. In a 
conference bridge, the representanon of the background noise is even more important, since several 
sources of background noise can be mixed together into a single channel, therefore reducing the overall 
signal-to-noise ratio. 



We propose a special approach for background noise handling in a conference bridge. We propose tracking 
the speech and the background noise activity, the background noise level, and the background noise 
statistics, for each of the incoming channels. We propose modifying the conference bridge speech decoders 
and the conference bridge speech encoders to enhance the background noise mixing and representation. 
We also propose to apply a speech enhancement (for example, by noise reduction) for the input speech 
channels and/or for the combined mixed waveform, to reduce the particular noise from each channel and 
the overall noise contribution in the conference bridge. 



Our solution 
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This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



